christoffel function
- Europe > France > Île-de-France > Paris > Paris (0.05)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Canada (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (2 more...)
- Europe > Slovakia (0.04)
- North America > United States > Texas > Brazos County > College Station (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
CS4ML: A general framework for active learning with arbitrary data based on Christoffel functions
We introduce a general framework for active learning in regression problems. Our framework extends the standard setup by allowing for general types of data, rather than merely pointwise samples of the target function. This generalization covers many cases of practical interest, such as data acquired in transform domains (e.g., Fourier data), vector-valued data (e.g., gradient-augmented data), data acquired along continuous curves, and, multimodal data (i.e., combinations of different types of measurements). Our framework considers random sampling according to a finite number of sampling measures and arbitrary nonlinear approximation spaces (model classes). We introduce the concept of \textit{generalized Christoffel functions} and show how these can be used to optimize the sampling measures. We prove that this leads to near-optimal sample complexity in various important cases. This paper focuses on applications in scientific computing, where active learning is often desirable, since it is usually expensive to generate data. We demonstrate the efficacy of our framework for gradient-augmented learning with polynomials, Magnetic Resonance Imaging (MRI) using generative models and adaptive sampling for solving PDEs using Physics-Informed Neural Networks (PINNs).
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Indiana (0.04)
- (2 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
- (2 more...)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > Canada (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (2 more...)
Leveraging the Christoffel Function for Outlier Detection in Data Streams
Ducharlet, Kévin, Travé-Massuyès, Louise, Lasserre, Jean-Bernard, Lann, Marie-Véronique Le, Miloudi, Youssef
Outlier detection holds significant importance in the realm of data mining, particularly with the growing pervasiveness of data acquisition methods. The ability to identify outliers in data streams is essential for maintaining data quality and detecting faults. However, dealing with data streams presents challenges due to the non-stationary nature of distributions and the ever-increasing data volume. While numerous methods have been proposed to tackle this challenge, a common drawback is the lack of straightforward parameterization in many of them. This article introduces two novel methods: DyCF and DyCG. DyCF leverages the Christoffel function from the theory of approximation and orthogonal polynomials. Conversely, DyCG capitalizes on the growth properties of the Christoffel function, eliminating the need for tuning parameters. Both approaches are firmly rooted in a well-defined algebraic framework, meeting crucial demands for data stream processing, with a specific focus on addressing low-dimensional aspects and maintaining data history without memory cost. A comprehensive comparison between DyCF, DyCG, and state-of-the-art methods is presented, using both synthetic and real industrial data streams. The results show that DyCF outperforms fine-tuning methods, offering superior performance in terms of execution time and memory usage. DyCG performs less well, but has the considerable advantage of requiring no tuning at all.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (4 more...)
- Research Report > Promising Solution (0.54)
- Research Report > New Finding (0.34)
Chameleon: A Flexible Data-mixing Framework for Language Model Pretraining and Finetuning
Xie, Wanyun, Tonin, Francesco, Cevher, Volkan
Training data mixtures greatly impact the generalization performance of large language models. Existing domain reweighting methods often rely on costly weight computations and require retraining when new data is introduced. To this end, we introduce a flexible and efficient data mixing framework, Chameleon, that employs leverage scores to quantify domain importance within a learned embedding space. We first construct a domain affinity matrix over domain embeddings. The induced leverage scores determine a mixture that upweights domains sharing common representations in embedding space. This formulation allows direct transfer to new data by computing the new domain embeddings. In experiments, we demonstrate improvements over three key scenarios: (i) our computed weights improve performance on pretraining domains with a fraction of the compute of existing methods; (ii) Chameleon can adapt to data changes without proxy retraining, boosting few-shot reasoning accuracies when transferred to new data; (iii) our method enables efficient domain reweighting in finetuning, consistently improving test perplexity on all finetuning domains over uniform mixture. Our code is available at https://github.com/LIONS-EPFL/Chameleon.
- North America > United States (0.28)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Education (0.46)
- Government (0.46)
CS4ML: A general framework for active learning with arbitrary data based on Christoffel functions
We introduce a general framework for active learning in regression problems. Our framework extends the standard setup by allowing for general types of data, rather than merely pointwise samples of the target function. This generalization covers many cases of practical interest, such as data acquired in transform domains (e.g., Fourier data), vector-valued data (e.g., gradient-augmented data), data acquired along continuous curves, and, multimodal data (i.e., combinations of different types of measurements). Our framework considers random sampling according to a finite number of sampling measures and arbitrary nonlinear approximation spaces (model classes). We introduce the concept of \textit{generalized Christoffel functions} and show how these can be used to optimize the sampling measures.